{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "83GJJF9fAgyP"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_5_kaggle_project.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HL640ydsAgyQ"
   },
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 8: Kaggle Data Sets**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "a4ih9V7vAgyR"
   },
   "source": [
    "# Module 8 Material\n",
    "\n",
    "* Part 8.1: Introduction to Kaggle [[Video]](https://www.youtube.com/watch?v=v4lJBhdCuCU&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_1_kaggle_intro.ipynb)\n",
    "* Part 8.2: Building Ensembles with Scikit-Learn and Keras [[Video]](https://www.youtube.com/watch?v=LQ-9ZRBLasw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_2_keras_ensembles.ipynb)\n",
    "* Part 8.3: How Should you Architect Your Keras Neural Network: Hyperparameters [[Video]](https://www.youtube.com/watch?v=1q9klwSoUQw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_3_keras_hyperparameters.ipynb)\n",
    "* Part 8.4: Bayesian Hyperparameter Optimization for Keras [[Video]](https://www.youtube.com/watch?v=sXdxyUCCm8s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_4_bayesian_hyperparameter_opt.ipynb)\n",
    "* **Part 8.5: Current Semester's Kaggle** [[Video]](https://www.youtube.com/watch?v=PHQt0aUasRg&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_5_kaggle_project.ipynb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uU7OTe1DAgyR"
   },
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running the correct version of TensorFlow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "NOdFRzaXAgyS",
    "outputId": "2475bc8b-19b2-487a-916a-3667060e76cf"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: using Google CoLab\n"
     ]
    }
   ],
   "source": [
    "# Start CoLab\n",
    "try:\n",
    "    %tensorflow_version 2.x\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False\n",
    "\n",
    "# Nicely formatted time string\n",
    "def hms_string(sec_elapsed):\n",
    "    h = int(sec_elapsed / (60 * 60))\n",
    "    m = int((sec_elapsed % (60 * 60)) / 60)\n",
    "    s = sec_elapsed % 60\n",
    "    return \"{}:{:>02}:{:>05.2f}\".format(h, m, s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LFMTMsOWAgyS"
   },
   "source": [
    "# Part 8.5: Current Semester's Kaggle\n",
    "\n",
    "Kaggke competition site for current semester:\n",
    "* [Spring 2023 Kaggle Assignment](https://www.kaggle.com/competitions/applications-of-deep-learning-wustlspring-2023)\n",
    "\n",
    "Previous Kaggle competition sites for this class (NOT this semester's assignment, feel free to use code):\n",
    "* [Fall 2022 Kaggle Assignment](https://www.kaggle.com/competitions/applications-of-deep-learning-wustlfall-2022)\n",
    "* [Spring 2022 Kaggle Assignment](https://www.kaggle.com/c/tsp-cv)\n",
    "* [Fall 2021 Kaggle Assignment](https://www.kaggle.com/c/applications-of-deep-learning-wustlfall-2021)\n",
    "* [Spring 2021 Kaggle Assignment](https://www.kaggle.com/c/applications-of-deep-learning-wustl-spring-2021b)\n",
    "* [Fall 2020 Kaggle Assignment](https://www.kaggle.com/c/applications-of-deep-learning-wustl-fall-2020)\n",
    "* [Spring 2020 Kaggle Assignment](https://www.kaggle.com/c/applications-of-deep-learningwustl-spring-2020)\n",
    "* [Fall 2019 Kaggle Assignment](https://kaggle.com/c/applications-of-deep-learningwustl-fall-2019)\n",
    "* [Spring 2019 Kaggle Assignment](https://www.kaggle.com/c/applications-of-deep-learningwustl-spring-2019)\n",
    "* [Fall 2018 Kaggle Assignment](https://www.kaggle.com/c/wustl-t81-558-washu-deep-learning-fall-2018)\n",
    "* [Spring 2018 Kaggle Assignment](https://www.kaggle.com/c/wustl-t81-558-washu-deep-learning-spring-2018)\n",
    "* [Fall 2017 Kaggle Assignment](https://www.kaggle.com/c/wustl-t81-558-washu-deep-learning-fall-2017)\n",
    "* [Spring 2017 Kaggle Assignment](https://inclass.kaggle.com/c/applications-of-deep-learning-wustl-spring-2017)\n",
    "* [Fall 2016 Kaggle Assignment](https://inclass.kaggle.com/c/wustl-t81-558-washu-deep-learning-fall-2016)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "p4eUCyQaAgyT"
   },
   "source": [
    "## Iris as a Kaggle Competition\n",
    "\n",
    "If I used the Iris data as a Kaggle, I would give you the following three files:\n",
    "\n",
    "* [kaggle_iris_test.csv](https://data.heatonresearch.com/data/t81-558/datasets/kaggle_iris_test.csv) - The data that Kaggle will evaluate you on. It contains only input; you must provide answers.  (contains x)\n",
    "* [kaggle_iris_train.csv](https://data.heatonresearch.com/data/t81-558/datasets/kaggle_iris_train.csv) - The data that you will use to train. (contains x and y)\n",
    "* [kaggle_iris_sample.csv](https://data.heatonresearch.com/data/t81-558/datasets/kaggle_iris_sample.csv) - A sample submission for Kaggle. (contains x and y)\n",
    "\n",
    "Important features of the Kaggle iris files (that differ from how we've previously seen files):\n",
    "\n",
    "* The iris species is already index encoded.\n",
    "* Your training data is in a separate file.\n",
    "* You will load the test data to generate a submission file.\n",
    "\n",
    "The following program generates a submission file for \"Iris Kaggle\". You can use it as a starting point for assignment 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "FoBv4ji_AgyT",
    "outputId": "c9fc1ce4-aab9-4190-e539-31b5a1559a16"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of classes: 3\n",
      "Restoring model weights from the end of the best epoch: 103.\n",
      "Epoch 108: early stopping\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.callbacks.History at 0x7f05e7452710>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "from sklearn.model_selection import train_test_split\n",
    "import tensorflow as tf\n",
    "import numpy as np\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense, Activation\n",
    "from tensorflow.keras.callbacks import EarlyStopping\n",
    "\n",
    "df_train = pd.read_csv(\n",
    "    \"https://data.heatonresearch.com/data/t81-558/datasets/\"+\\\n",
    "    \"kaggle_iris_train.csv\", na_values=['NA','?'])\n",
    "\n",
    "# Encode feature vector\n",
    "df_train.drop('id', axis=1, inplace=True)\n",
    "\n",
    "num_classes = len(df_train.groupby('species').species.nunique())\n",
    "\n",
    "print(\"Number of classes: {}\".format(num_classes))\n",
    "\n",
    "# Convert to numpy - Classification\n",
    "x = df_train[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values\n",
    "dummies = pd.get_dummies(df_train['species']) # Classification\n",
    "species = dummies.columns\n",
    "y = dummies.values\n",
    "    \n",
    "# Split into train/test\n",
    "x_train, x_test, y_train, y_test = train_test_split(    \n",
    "    x, y, test_size=0.25, random_state=45)\n",
    "\n",
    "# Train, with early stopping\n",
    "model = Sequential()\n",
    "model.add(Dense(50, input_dim=x.shape[1], activation='relu'))\n",
    "model.add(Dense(25))\n",
    "model.add(Dense(y.shape[1],activation='softmax'))\n",
    "model.compile(loss='categorical_crossentropy', optimizer='adam')\n",
    "monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, \n",
    "                        patience=5, verbose=1, mode='auto',\n",
    "                       restore_best_weights=True)\n",
    "\n",
    "model.fit(x_train,y_train,validation_data=(x_test,y_test),\n",
    "          callbacks=[monitor],verbose=0,epochs=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "u5A6iWVhAgyU"
   },
   "source": [
    "Now that we've trained the neural network, we can check its log loss."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "dX2DIswHAgyU",
    "outputId": "79b55679-114e-4ff9-8b72-00ddb8a65746"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Log loss score: 0.10988010508939623\n"
     ]
    }
   ],
   "source": [
    "from sklearn import metrics\n",
    "\n",
    "# Calculate multi log loss error\n",
    "pred = model.predict(x_test)\n",
    "score = metrics.log_loss(y_test, pred)\n",
    "print(\"Log loss score: {}\".format(score))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Hmf6QKjdAgyV"
   },
   "source": [
    "Now we are ready to generate the Kaggle submission file.  We will use the iris test data that does not contain a $y$ target value.  It is our job to predict this value and submit it to Kaggle."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Fc5roTyDAgyV",
    "outputId": "c1fcbc80-4d56-4ff5-a353-dd45d3bb760d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    id  species-0  species-1  species-2\n",
      "0  100   0.022300   0.777859   0.199841\n",
      "1  101   0.001309   0.273849   0.724842\n",
      "2  102   0.001153   0.319349   0.679498\n",
      "3  103   0.958006   0.041989   0.000005\n",
      "4  104   0.976932   0.023066   0.000002\n"
     ]
    }
   ],
   "source": [
    "# Generate Kaggle submit file\n",
    "\n",
    "# Encode feature vector\n",
    "df_test = pd.read_csv(\n",
    "    \"https://data.heatonresearch.com/data/t81-558/datasets/\"+\\\n",
    "    \"kaggle_iris_test.csv\", na_values=['NA','?'])\n",
    "\n",
    "# Convert to numpy - Classification\n",
    "ids = df_test['id']\n",
    "df_test.drop('id', axis=1, inplace=True)\n",
    "x = df_test[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values\n",
    "y = dummies.values\n",
    "\n",
    "# Generate predictions\n",
    "pred = model.predict(x)\n",
    "#pred\n",
    "\n",
    "# Create submission data set\n",
    "\n",
    "df_submit = pd.DataFrame(pred)\n",
    "df_submit.insert(0,'id',ids)\n",
    "df_submit.columns = ['id','species-0','species-1','species-2']\n",
    "\n",
    "# Write submit file locally\n",
    "df_submit.to_csv(\"iris_submit.csv\", index=False) \n",
    "\n",
    "print(df_submit[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Mw5ZEszvAgyV"
   },
   "source": [
    "## MPG as a Kaggle Competition (Regression)\n",
    "\n",
    "If the Auto MPG data were used as a Kaggle, you would be given the following three files:\n",
    "\n",
    "* [kaggle_mpg_test.csv](https://data.heatonresearch.com/data/t81-558/datasets/kaggle_auto_test.csv) - The data that Kaggle will evaluate you on.  Contains only input, you must provide answers.  (contains x)\n",
    "* [kaggle_mpg_train.csv](https://data.heatonresearch.com/data/t81-558/datasets/kaggle_auto_test.csv) - The data that you will use to train. (contains x and y)\n",
    "* [kaggle_mpg_sample.csv](https://data.heatonresearch.com/data/t81-558/datasets/kaggle_auto_sample.csv) - A sample submission for Kaggle. (contains x and y)\n",
    "\n",
    "Important features of the Kaggle iris files (that differ from how we've previously seen files):\n",
    "\n",
    "The following program generates a submission file for \"MPG Kaggle\".  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "JjZ1Q_HpAgyV",
    "outputId": "00ed3905-be90-4a2e-9834-6cd57ac042c2"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/1000\n",
      "9/9 - 1s - loss: 1797.5945 - val_loss: 1272.4421 - 1s/epoch - 144ms/step\n",
      "Epoch 2/1000\n",
      "9/9 - 0s - loss: 574.7726 - val_loss: 734.3082 - 92ms/epoch - 10ms/step\n",
      "Epoch 3/1000\n",
      "9/9 - 0s - loss: 487.3118 - val_loss: 446.3558 - 76ms/epoch - 8ms/step\n",
      "Epoch 4/1000\n",
      "9/9 - 0s - loss: 326.7128 - val_loss: 321.7191 - 96ms/epoch - 11ms/step\n",
      "Epoch 5/1000\n",
      "9/9 - 0s - loss: 294.8217 - val_loss: 271.3473 - 70ms/epoch - 8ms/step\n",
      "Epoch 6/1000\n",
      "9/9 - 0s - loss: 259.8376 - val_loss: 239.6796 - 116ms/epoch - 13ms/step\n",
      "Epoch 7/1000\n",
      "9/9 - 0s - loss: 250.4708 - val_loss: 227.4295 - 73ms/epoch - 8ms/step\n",
      "Epoch 8/1000\n",
      "9/9 - 0s - loss: 227.1252 - val_loss: 198.4167 - 125ms/epoch - 14ms/step\n",
      "Epoch 9/1000\n",
      "9/9 - 0s - loss: 225.6681 - val_loss: 195.5055 - 95ms/epoch - 11ms/step\n",
      "Epoch 10/1000\n",
      "9/9 - 0s - loss: 209.1198 - val_loss: 184.1092 - 121ms/epoch - 13ms/step\n",
      "Epoch 11/1000\n",
      "9/9 - 0s - loss: 195.4801 - val_loss: 176.0311 - 108ms/epoch - 12ms/step\n",
      "Epoch 12/1000\n",
      "9/9 - 0s - loss: 198.6493 - val_loss: 168.1613 - 163ms/epoch - 18ms/step\n",
      "Epoch 13/1000\n",
      "9/9 - 0s - loss: 198.5606 - val_loss: 196.0306 - 114ms/epoch - 13ms/step\n",
      "Epoch 14/1000\n",
      "9/9 - 0s - loss: 184.3067 - val_loss: 179.8450 - 99ms/epoch - 11ms/step\n",
      "Epoch 15/1000\n",
      "9/9 - 0s - loss: 178.6627 - val_loss: 148.1014 - 80ms/epoch - 9ms/step\n",
      "Epoch 16/1000\n",
      "9/9 - 0s - loss: 154.0201 - val_loss: 129.9253 - 74ms/epoch - 8ms/step\n",
      "Epoch 17/1000\n",
      "9/9 - 0s - loss: 145.2373 - val_loss: 124.0609 - 79ms/epoch - 9ms/step\n",
      "Epoch 18/1000\n",
      "9/9 - 0s - loss: 140.0318 - val_loss: 116.7844 - 86ms/epoch - 10ms/step\n",
      "Epoch 19/1000\n",
      "9/9 - 0s - loss: 135.1688 - val_loss: 115.0745 - 136ms/epoch - 15ms/step\n",
      "Epoch 20/1000\n",
      "9/9 - 0s - loss: 132.8391 - val_loss: 106.9831 - 169ms/epoch - 19ms/step\n",
      "Epoch 21/1000\n",
      "9/9 - 0s - loss: 123.6673 - val_loss: 105.7211 - 95ms/epoch - 11ms/step\n",
      "Epoch 22/1000\n",
      "9/9 - 0s - loss: 123.7169 - val_loss: 99.6713 - 112ms/epoch - 12ms/step\n",
      "Epoch 23/1000\n",
      "9/9 - 0s - loss: 118.0815 - val_loss: 96.0683 - 150ms/epoch - 17ms/step\n",
      "Epoch 24/1000\n",
      "9/9 - 0s - loss: 114.6363 - val_loss: 99.1486 - 153ms/epoch - 17ms/step\n",
      "Epoch 25/1000\n",
      "9/9 - 0s - loss: 112.3965 - val_loss: 93.8642 - 180ms/epoch - 20ms/step\n",
      "Epoch 26/1000\n",
      "9/9 - 0s - loss: 111.2470 - val_loss: 88.3417 - 139ms/epoch - 15ms/step\n",
      "Epoch 27/1000\n",
      "9/9 - 0s - loss: 107.8639 - val_loss: 86.7927 - 135ms/epoch - 15ms/step\n",
      "Epoch 28/1000\n",
      "9/9 - 0s - loss: 103.0426 - val_loss: 89.0441 - 101ms/epoch - 11ms/step\n",
      "Epoch 29/1000\n",
      "9/9 - 0s - loss: 110.6277 - val_loss: 82.4294 - 159ms/epoch - 18ms/step\n",
      "Epoch 30/1000\n",
      "9/9 - 0s - loss: 100.3681 - val_loss: 90.8037 - 82ms/epoch - 9ms/step\n",
      "Epoch 31/1000\n",
      "9/9 - 0s - loss: 105.4711 - val_loss: 79.2106 - 76ms/epoch - 8ms/step\n",
      "Epoch 32/1000\n",
      "9/9 - 0s - loss: 98.7603 - val_loss: 79.9620 - 73ms/epoch - 8ms/step\n",
      "Epoch 33/1000\n",
      "9/9 - 0s - loss: 94.7678 - val_loss: 76.8616 - 78ms/epoch - 9ms/step\n",
      "Epoch 34/1000\n",
      "9/9 - 0s - loss: 93.8199 - val_loss: 77.0823 - 76ms/epoch - 8ms/step\n",
      "Epoch 35/1000\n",
      "9/9 - 0s - loss: 94.8746 - val_loss: 73.9967 - 62ms/epoch - 7ms/step\n",
      "Epoch 36/1000\n",
      "9/9 - 0s - loss: 95.3178 - val_loss: 73.0059 - 60ms/epoch - 7ms/step\n",
      "Epoch 37/1000\n",
      "9/9 - 0s - loss: 91.1315 - val_loss: 80.8389 - 57ms/epoch - 6ms/step\n",
      "Epoch 38/1000\n",
      "9/9 - 0s - loss: 96.4810 - val_loss: 77.8854 - 59ms/epoch - 7ms/step\n",
      "Epoch 39/1000\n",
      "9/9 - 0s - loss: 91.1039 - val_loss: 69.9539 - 40ms/epoch - 4ms/step\n",
      "Epoch 40/1000\n",
      "9/9 - 0s - loss: 86.9596 - val_loss: 69.3511 - 43ms/epoch - 5ms/step\n",
      "Epoch 41/1000\n",
      "9/9 - 0s - loss: 87.6142 - val_loss: 70.1390 - 57ms/epoch - 6ms/step\n",
      "Epoch 42/1000\n",
      "9/9 - 0s - loss: 88.0185 - val_loss: 73.6168 - 38ms/epoch - 4ms/step\n",
      "Epoch 43/1000\n",
      "9/9 - 0s - loss: 92.8655 - val_loss: 67.5213 - 38ms/epoch - 4ms/step\n",
      "Epoch 44/1000\n",
      "9/9 - 0s - loss: 88.5278 - val_loss: 69.9708 - 59ms/epoch - 7ms/step\n",
      "Epoch 45/1000\n",
      "9/9 - 0s - loss: 82.9339 - val_loss: 70.3786 - 39ms/epoch - 4ms/step\n",
      "Epoch 46/1000\n",
      "9/9 - 0s - loss: 81.7092 - val_loss: 63.3550 - 59ms/epoch - 7ms/step\n",
      "Epoch 47/1000\n",
      "9/9 - 0s - loss: 81.1514 - val_loss: 78.7681 - 59ms/epoch - 7ms/step\n",
      "Epoch 48/1000\n",
      "9/9 - 0s - loss: 99.3562 - val_loss: 62.8894 - 64ms/epoch - 7ms/step\n",
      "Epoch 49/1000\n",
      "9/9 - 0s - loss: 96.8292 - val_loss: 67.8047 - 55ms/epoch - 6ms/step\n",
      "Epoch 50/1000\n",
      "9/9 - 0s - loss: 88.7995 - val_loss: 67.5249 - 56ms/epoch - 6ms/step\n",
      "Epoch 51/1000\n",
      "9/9 - 0s - loss: 80.6064 - val_loss: 96.2975 - 58ms/epoch - 6ms/step\n",
      "Epoch 52/1000\n",
      "9/9 - 0s - loss: 95.2732 - val_loss: 62.4323 - 39ms/epoch - 4ms/step\n",
      "Epoch 53/1000\n",
      "9/9 - 0s - loss: 75.1992 - val_loss: 64.0174 - 39ms/epoch - 4ms/step\n",
      "Epoch 54/1000\n",
      "9/9 - 0s - loss: 75.5173 - val_loss: 57.8594 - 40ms/epoch - 4ms/step\n",
      "Epoch 55/1000\n",
      "9/9 - 0s - loss: 72.6369 - val_loss: 56.2216 - 47ms/epoch - 5ms/step\n",
      "Epoch 56/1000\n",
      "9/9 - 0s - loss: 72.8636 - val_loss: 55.3956 - 54ms/epoch - 6ms/step\n",
      "Epoch 57/1000\n",
      "9/9 - 0s - loss: 69.0251 - val_loss: 70.7940 - 56ms/epoch - 6ms/step\n",
      "Epoch 58/1000\n",
      "9/9 - 0s - loss: 75.8152 - val_loss: 63.7728 - 37ms/epoch - 4ms/step\n",
      "Epoch 59/1000\n",
      "9/9 - 0s - loss: 71.6866 - val_loss: 59.5908 - 41ms/epoch - 5ms/step\n",
      "Epoch 60/1000\n",
      "9/9 - 0s - loss: 69.3349 - val_loss: 52.7848 - 38ms/epoch - 4ms/step\n",
      "Epoch 61/1000\n",
      "9/9 - 0s - loss: 67.8410 - val_loss: 53.5977 - 54ms/epoch - 6ms/step\n",
      "Epoch 62/1000\n",
      "9/9 - 0s - loss: 68.4640 - val_loss: 53.6664 - 39ms/epoch - 4ms/step\n",
      "Epoch 63/1000\n",
      "9/9 - 0s - loss: 63.7229 - val_loss: 52.4224 - 44ms/epoch - 5ms/step\n",
      "Epoch 64/1000\n",
      "9/9 - 0s - loss: 69.8485 - val_loss: 59.1973 - 53ms/epoch - 6ms/step\n",
      "Epoch 65/1000\n",
      "9/9 - 0s - loss: 75.7193 - val_loss: 70.1342 - 37ms/epoch - 4ms/step\n",
      "Epoch 66/1000\n",
      "9/9 - 0s - loss: 87.7418 - val_loss: 55.3687 - 38ms/epoch - 4ms/step\n",
      "Epoch 67/1000\n",
      "9/9 - 0s - loss: 72.8599 - val_loss: 52.9028 - 44ms/epoch - 5ms/step\n",
      "Epoch 68/1000\n",
      "9/9 - 0s - loss: 69.9528 - val_loss: 49.9109 - 38ms/epoch - 4ms/step\n",
      "Epoch 69/1000\n",
      "9/9 - 0s - loss: 62.7782 - val_loss: 46.6361 - 39ms/epoch - 4ms/step\n",
      "Epoch 70/1000\n",
      "9/9 - 0s - loss: 58.4024 - val_loss: 50.8190 - 38ms/epoch - 4ms/step\n",
      "Epoch 71/1000\n",
      "9/9 - 0s - loss: 63.5687 - val_loss: 46.6161 - 44ms/epoch - 5ms/step\n",
      "Epoch 72/1000\n",
      "9/9 - 0s - loss: 65.9290 - val_loss: 47.1278 - 40ms/epoch - 4ms/step\n",
      "Epoch 73/1000\n",
      "9/9 - 0s - loss: 74.9235 - val_loss: 61.1282 - 42ms/epoch - 5ms/step\n",
      "Epoch 74/1000\n",
      "9/9 - 0s - loss: 63.6773 - val_loss: 45.0233 - 39ms/epoch - 4ms/step\n",
      "Epoch 75/1000\n",
      "9/9 - 0s - loss: 55.8287 - val_loss: 59.8986 - 41ms/epoch - 5ms/step\n",
      "Epoch 76/1000\n",
      "9/9 - 0s - loss: 58.9969 - val_loss: 52.0535 - 39ms/epoch - 4ms/step\n",
      "Epoch 77/1000\n",
      "9/9 - 0s - loss: 60.7104 - val_loss: 43.0530 - 46ms/epoch - 5ms/step\n",
      "Epoch 78/1000\n",
      "9/9 - 0s - loss: 59.7358 - val_loss: 45.3669 - 41ms/epoch - 5ms/step\n",
      "Epoch 79/1000\n",
      "9/9 - 0s - loss: 60.9792 - val_loss: 40.7967 - 41ms/epoch - 5ms/step\n",
      "Epoch 80/1000\n",
      "9/9 - 0s - loss: 58.0294 - val_loss: 49.0612 - 42ms/epoch - 5ms/step\n",
      "Epoch 81/1000\n",
      "9/9 - 0s - loss: 57.6733 - val_loss: 41.7604 - 44ms/epoch - 5ms/step\n",
      "Epoch 82/1000\n",
      "9/9 - 0s - loss: 50.3309 - val_loss: 39.1461 - 38ms/epoch - 4ms/step\n",
      "Epoch 83/1000\n",
      "9/9 - 0s - loss: 54.2316 - val_loss: 40.8561 - 36ms/epoch - 4ms/step\n",
      "Epoch 84/1000\n",
      "9/9 - 0s - loss: 66.4084 - val_loss: 38.1869 - 60ms/epoch - 7ms/step\n",
      "Epoch 85/1000\n",
      "9/9 - 0s - loss: 50.0778 - val_loss: 37.8852 - 56ms/epoch - 6ms/step\n",
      "Epoch 86/1000\n",
      "9/9 - 0s - loss: 47.0763 - val_loss: 37.3743 - 39ms/epoch - 4ms/step\n",
      "Epoch 87/1000\n",
      "9/9 - 0s - loss: 46.1752 - val_loss: 45.8444 - 45ms/epoch - 5ms/step\n",
      "Epoch 88/1000\n",
      "9/9 - 0s - loss: 49.4047 - val_loss: 37.3778 - 40ms/epoch - 4ms/step\n",
      "Epoch 89/1000\n",
      "9/9 - 0s - loss: 46.5478 - val_loss: 36.2859 - 38ms/epoch - 4ms/step\n",
      "Epoch 90/1000\n",
      "9/9 - 0s - loss: 44.7429 - val_loss: 47.2213 - 38ms/epoch - 4ms/step\n",
      "Epoch 91/1000\n",
      "9/9 - 0s - loss: 49.7726 - val_loss: 52.5501 - 42ms/epoch - 5ms/step\n",
      "Epoch 92/1000\n",
      "9/9 - 0s - loss: 53.5449 - val_loss: 62.3078 - 57ms/epoch - 6ms/step\n",
      "Epoch 93/1000\n",
      "9/9 - 0s - loss: 54.7558 - val_loss: 51.2010 - 43ms/epoch - 5ms/step\n",
      "Epoch 94/1000\n",
      "Restoring model weights from the end of the best epoch: 89.\n",
      "9/9 - 0s - loss: 52.3631 - val_loss: 42.2640 - 69ms/epoch - 8ms/step\n",
      "Epoch 94: early stopping\n"
     ]
    }
   ],
   "source": [
    "# HIDE OUTPUT\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense, Activation\n",
    "from sklearn.model_selection import train_test_split\n",
    "from tensorflow.keras.callbacks import EarlyStopping\n",
    "import pandas as pd\n",
    "import io\n",
    "import os\n",
    "import requests\n",
    "import numpy as np\n",
    "from sklearn import metrics\n",
    "\n",
    "save_path = \".\"\n",
    "\n",
    "df = pd.read_csv(\n",
    "    \"https://data.heatonresearch.com/data/t81-558/datasets/\"+\\\n",
    "    \"kaggle_auto_train.csv\", \n",
    "    na_values=['NA', '?'])\n",
    "\n",
    "cars = df['name']\n",
    "\n",
    "# Handle missing value\n",
    "df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())\n",
    "\n",
    "# Pandas to Numpy\n",
    "x = df[['cylinders', 'displacement', 'horsepower', 'weight',\n",
    "       'acceleration', 'year', 'origin']].values\n",
    "y = df['mpg'].values # regression\n",
    "\n",
    "# Split into train/test\n",
    "x_train, x_test, y_train, y_test = train_test_split(    \n",
    "    x, y, test_size=0.25, random_state=42)\n",
    "\n",
    "# Build the neural network\n",
    "model = Sequential()\n",
    "model.add(Dense(25, input_dim=x.shape[1], activation='relu')) # Hidden 1\n",
    "model.add(Dense(10, activation='relu')) # Hidden 2\n",
    "model.add(Dense(1)) # Output\n",
    "model.compile(loss='mean_squared_error', optimizer='adam')\n",
    "monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, \n",
    "                        verbose=1, mode='auto', restore_best_weights=True)\n",
    "model.fit(x_train,y_train,validation_data=(x_test,y_test),\n",
    "          verbose=2,callbacks=[monitor],epochs=1000)\n",
    "\n",
    "# Predict\n",
    "pred = model.predict(x_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BFJcZDy6AgyV"
   },
   "source": [
    "Now that we've trained the neural network, we can check its RMSE error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-8zshQm0AgyV",
    "outputId": "b5e8d691-798b-445e-f44b-a997fad1ab6b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Final score (RMSE): 6.023776405947501\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Measure RMSE error.  RMSE is common for regression.\n",
    "score = np.sqrt(metrics.mean_squared_error(pred,y_test))\n",
    "print(\"Final score (RMSE): {}\".format(score))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ZQf79HgwAgyW"
   },
   "source": [
    "Now we are ready to generate the Kaggle submission file.  We will use the MPG test data that does not contain a $y$ target value.  It is our job to predict this value and submit it to Kaggle."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Y16gAEzkAgyW",
    "outputId": "fa7a3a20-f462-48b0-f154-a4b7eeaa66f3"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    id        mpg\n",
      "0  350  27.158819\n",
      "1  351  24.450621\n",
      "2  352  24.913355\n",
      "3  353  26.994867\n",
      "4  354  26.669268\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Generate Kaggle submit file\n",
    "\n",
    "# Encode feature vector\n",
    "df_test = pd.read_csv(\n",
    "    \"https://data.heatonresearch.com/data/t81-558/datasets/\"+\\\n",
    "    \"kaggle_auto_test.csv\", na_values=['NA','?'])\n",
    "\n",
    "# Convert to numpy - regression\n",
    "ids = df_test['id']\n",
    "df_test.drop('id', axis=1, inplace=True)\n",
    "\n",
    "# Handle missing value\n",
    "df_test['horsepower'] = df_test['horsepower'].\\\n",
    "    fillna(df['horsepower'].median())\n",
    "\n",
    "x = df_test[['cylinders', 'displacement', 'horsepower', 'weight',\n",
    "       'acceleration', 'year', 'origin']].values\n",
    "\n",
    "# Generate predictions\n",
    "pred = model.predict(x)\n",
    "#pred\n",
    "\n",
    "# Create submission data set\n",
    "\n",
    "df_submit = pd.DataFrame(pred)\n",
    "df_submit.insert(0,'id',ids)\n",
    "df_submit.columns = ['id','mpg']\n",
    "\n",
    "# Write submit file locally\n",
    "df_submit.to_csv(\"auto_submit.csv\", index=False) \n",
    "\n",
    "print(df_submit[:5])"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "colab": {
   "collapsed_sections": [],
   "name": "Copy of t81_558_class_08_5_kaggle_project.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
